275 research outputs found

    On the Hardness and Inapproximability of Recognizing Wheeler Graphs

    Get PDF
    In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs [Gagie et al., Theor. Comput. Sci., 2017]. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs, and that Wheeler graphs can be indexed in a way which is space efficient. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We present: - The problem of recognizing whether a given graph G=(V,E) is a Wheeler graph is NP-complete for any edge label alphabet of size sigma >= 2, even when G is a DAG. This holds even on a restricted, subset of graphs called d-NFA\u27s for d >= 5. This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFA\u27s where d <= 2. We also show the recognition problem can be solved in linear time for sigma =1; - There exists an 2^{e log sigma + O(n + e)} time exact algorithm where n = |V| and e = |E|. This algorithm relies on graph isomorphism being computable in strictly sub-exponential time; - We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to remove the minimum number of edges in order to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant C >= 1 for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all C >= 1, it is NP-hard to find a C-approximation; - We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of the WGV). In contrast to WGV, we prove that the WS problem is in APX for sigma=O(1); The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial time solvable, raising the open question of which parameters determine this problem\u27s difficulty

    Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard

    Get PDF

    Compressibility-Aware Quantum Algorithms on Strings

    Full text link
    Sublinear time quantum algorithms have been established for many fundamental problems on strings. This work demonstrates that new, faster quantum algorithms can be designed when the string is highly compressible. We focus on two popular and theoretically significant compression algorithms -- the Lempel-Ziv77 algorithm (LZ77) and the Run-length-encoded Burrows-Wheeler Transform (RL-BWT), and obtain the results below. We first provide a quantum algorithm running in O~(zn)\tilde{O}(\sqrt{zn}) time for finding the LZ77 factorization of an input string T[1..n]T[1..n] with zz factors. Combined with multiple existing results, this yields an O~(rn)\tilde{O}(\sqrt{rn}) time quantum algorithm for finding the RL-BWT encoding with rr BWT runs. Note that r=Θ~(z)r = \tilde{\Theta}(z). We complement these results with lower bounds proving that our algorithms are optimal (up to polylog factors). Next, we study the problem of compressed indexing, where we provide a O~(rn)\tilde{O}(\sqrt{rn}) time quantum algorithm for constructing a recently designed O~(r)\tilde{O}(r) space structure with equivalent capabilities as the suffix tree. This data structure is then applied to numerous problems to obtain sublinear time quantum algorithms when the input is highly compressible. For example, we show that the longest common substring of two strings of total length nn can be computed in O~(zn)\tilde{O}(\sqrt{zn}) time, where zz is the number of factors in the LZ77 factorization of their concatenation. This beats the best known O~(n23)\tilde{O}(n^\frac{2}{3}) time quantum algorithm when zz is sufficiently small

    Conformal blocks on smoothings via mode transition algebras

    Full text link
    Here we define a series of associative algebras attached to a vertex operator algebra VV, called mode transition algebras, showing they reflect both algebraic properties of VV and geometric constructions on moduli of curves. One can define sheaves of coinvariants on pointed coordinatized curves from VV-modules. We show that if the mode transition algebras admit multiplicative identities with certain properties, these sheaves deform as wanted on families of curves with nodes (so VV satisfies smoothing). Consequently, coherent sheaves of coinvariants defined by vertex operator algebras that satisfy smoothing form vector bundles. We also show that mode transition algebras give information about higher level Zhu algebras and generalized Verma modules. As an application, we completely describe higher level Zhu algebras of the Heisenberg vertex algebra for all levels, proving a conjecture of Addabbo--Barron.Comment: 56 page

    Algorithms and Lower Bounds for Ordering Problems on Strings

    Get PDF
    This dissertation presents novel algorithms and conditional lower bounds for a collection of string and text-compression-related problems. These results are unified under the theme of ordering constraint satisfaction. Utilizing the connections to ordering constraint satisfaction, we provide hardness results and algorithms for the following: recognizing a type of labeled graph amenable to text-indexing known as Wheeler graphs, minimizing the number of maximal unary substrings occurring in the Burrows-Wheeler Transformation of a text, minimizing the number of factors occurring in the Lyndon factorization of a text, and finding an optimal reference string for relative Lempel-Ziv encoding

    Feasibility of Flow Decomposition with Subpath Constraints in Linear Time

    Get PDF

    On the Complexity of BWT-Runs Minimization via Alphabet Reordering

    Get PDF
    The Burrows-Wheeler Transform (BWT) has been an essential tool in text compression and indexing. First introduced in 1994, it went on to provide the backbone for the first encoding of the classic suffix tree data structure in space close to the entropy-based lower bound. Recently, there has been the development of compact suffix trees in space proportional to "rr", the number of runs in the BWT, as well as the appearance of rr in the time complexity of new algorithms. Unlike other popular measures of compression, the parameter rr is sensitive to the lexicographic ordering given to the text's alphabet. Despite several past attempts to exploit this, a provably efficient algorithm for finding, or approximating, an alphabet ordering which minimizes rr has been open for years. We present the first set of results on the computational complexity of minimizing BWT-runs via alphabet reordering. We prove that the decision version of this problem is NP-complete and cannot be solved in time 2o(σ+n)2^{o(\sigma + \sqrt{n})} unless the Exponential Time Hypothesis fails, where σ\sigma is the size of the alphabet and nn is the length of the text. We also show that the optimization problem is APX-hard. In doing so, we relate two previously disparate topics: the optimal traveling salesperson path and the number of runs in the BWT of a text, providing a surprising connection between problems on graphs and text compression. Also, by relating recent results in the field of dictionary compression, we illustrate that an arbitrary alphabet ordering provides a O(log2n)O(\log^2 n)-approximation. We provide an optimal linear-time algorithm for the problem of finding a run minimizing ordering on a subset of symbols (occurring only once) under ordering constraints, and prove a generalization of this problem to a class of graphs with BWT like properties called Wheeler graphs is NP-complete

    The Fine-Grained Complexity of Median and Center String Problems Under Edit Distance

    Get PDF
    We present the first fine-grained complexity results on two classic problems on strings. The first one is the k-Median-Edit-Distance problem, where the input is a collection of k strings, each of length at most n, and the task is to find a new string that minimizes the sum of the edit distances from itself to all other strings in the input. Arising frequently in computational biology, this problem provides an important generalization of edit distance to multiple strings and is similar to the multiple sequence alignment problem in bioinformatics. We demonstrate that for any ? > 0 and k ? 2, an O(n^{k-?}) time solution for the k-Median-Edit-Distance problem over an alphabet of size O(k) refutes the Strong Exponential Time Hypothesis (SETH). This provides the first matching conditional lower bound for the O(n^k) time algorithm established in 1975 by Sankoff. The second problem we study is the k-Center-Edit-Distance problem. Here also, the input is a collection of k strings, each of length at most n. The task is to find a new string that minimizes the maximum edit distance from itself to any other string in the input. We prove that the same conditional lower bound as before holds. Our results also imply new conditional lower bounds for the k-Tree-Alignment and the k-Bottleneck-Tree-Alignment problems studied in phylogenetics
    corecore